Probability and Statistics: The Science of Uncertainty: Statistics as Random Variables: The Sampling Distribution

In statistical inference, we move from observing individual data points to analyzing a **statistic**—a functional mapping $Y = h(X_1, X_2, \dots, X_n)$ of a sample sequence. Because the underlying sample consists of random variables, the statistic itself is a random variable, and its probability law is known as the **sampling distribution**.

The Statistic as a Mapping

A statistic is formally defined as a function $h: \mathbb{R}^n \to \mathbb{R}$. We define the probability of the statistic falling into a set $B$ using the pre-image:

$$h^{-1} B = \{(x_1, x_2, \dots, x_n) : h(x_1, x_2, \dots, x_n) \in B\}$$

The I.I.D. Foundation

For a sample of i.i.d. (independent and identically distributed) random variables, the joint probability of a specific sample point $(x_1, \dots, x_n)$ is the product of their marginal probabilities: $p(x_1)p(x_2)\dots p(x_n)$. This product serves as the weight for each point when calculating the total probability of the statistic taking a specific value.

Example 4.1.1: The Geometric Mean

Consider a discrete population where $p_X(1) = 1/2$, $p_X(2) = 1/4$, and $p_X(3) = 1/4$. We draw a sample of size $n=2$ ($X_1, X_2$) and define our statistic as the geometric mean: $Y_2 = (X_1 X_2)^{1/2}$.

To find the distribution of $Y_2$, we list all 9 possible pairs $(X_1, X_2)$, calculate their joint probability, and the resulting $Y_2$:

Pair $(x_1, x_2)$	Prob $P(x_1)P(x_2)$	$Y = \sqrt{x_1 x_2}$
(1, 1)	1/4	1.000
(1, 2), (2, 1)	1/8 + 1/8 = 1/4	1.414
(1, 3), (3, 1)	1/8 + 1/8 = 1/4	1.732
(2, 2)	1/16	2.000
(2, 3), (3, 2)	1/16 + 1/16 = 1/8	2.449
(3, 3)	1/16	3.000

Exact vs. Asymptotic Distributions

Before moving to limit theorems like the Central Limit Theorem (CLT), we must master the "Exact Distribution." This involves calculating the specific probability mass or density function for a statistic given a small, finite $n$. When the analytic form becomes intractable, we resort to numerical simulations like **Monte Carlo approximations**.

🎯 Core Principle

A sampling distribution is the distribution of a random variable corresponding to a function of some i.i.d. sequence. It is the bridge between raw data and scientific inference.

QUESTION 1

Suppose that $X_1, X_2, X_3$ are i.i.d. from the distribution in Example 4.1.1. What is the probability that the geometric mean $Y_3 = (X_1 X_2 X_3)^{1/3}$ is equal to 1?

$1/2$

$1/4$

$1/8$

$1/27$

QUESTION 2

A fair six-sided die is tossed $n = 2$ independent times. Which of the following is the probability that the sample mean is exactly 1.5?

$1/36$

$2/36$

$3/36$

$1/6$

QUESTION 3

In an urn with proportion $p$ of chips labelled 0 and $1-p$ labelled 1, a sample of $n=2$ is drawn with replacement. What is the probability that the sample mean is 0.5?

$p^2$

$(1-p)^2$

$2p(1-p)$

$p(1-p)$

QUESTION 4

Which mathematical construct represents the set of all sample points $(x_1, \dots, x_n)$ that result in a statistic $h$ falling into a specific interval $B$?

The Joint Density Function

The Pre-image $h^{-1}B$

The Moment Generating Function

The Expected Value Mapping

QUESTION 5

When approximating the integral $\int_{-\infty}^{\infty} \cos^2(x)e^{-x^2/2} dx$ via Monte Carlo, which distribution should you sample from to simplify the calculation?

Uniform(0, 1)

Poisson(1)

Standard Normal $N(0, 1)$

Exponential(1)